Mortgage Loan Rates Forecast Model

Data Engineering

Import Libraries

# Load API key and secret from environment variables
from dotenv import load_dotenv
load_dotenv()

# System libraries
import glob

# ML libraries
import pandas as pd

# ValidMind libraries 
import validmind as vm

# Plotting libraries 

Data Collection

Load FRED Data

def merge_fred_csv_files(file_pattern):
    # Use glob to find all files matching the specified pattern
    file_list = glob.glob(file_pattern)

    # Initialize an empty list to store individual DataFrames
    dataframes = []

    # Iterate through each file in the file list
    for file in file_list:
        # Read the CSV file into a DataFrame
        df = pd.read_csv(file, parse_dates=['DATE'], index_col='DATE')

        # Add the DataFrame to the list of DataFrames
        dataframes.append(df)

    # Merge all the DataFrames in the list into a single DataFrame
    merged_df = pd.concat(dataframes, axis=1)

    return merged_df
file_path = '../datasets/fred/*.csv'
fred_df = merge_fred_csv_files(file_path)
display(fred_df)
GDPC1 GS5 GS10 GS3 MORTGAGE30US UNRATE CPIAUCSL FEDFUNDS GDP
DATE
1947-01-01 2034.450 NaN NaN NaN NaN NaN 21.48 NaN 243.164
1947-02-01 NaN NaN NaN NaN NaN NaN 21.62 NaN NaN
1947-03-01 NaN NaN NaN NaN NaN NaN 22.00 NaN NaN
1947-04-01 2029.024 NaN NaN NaN NaN NaN 22.00 NaN 245.968
1947-05-01 NaN NaN NaN NaN NaN NaN 21.95 NaN NaN
... ... ... ... ... ... ... ... ... ...
2023-04-01 NaN NaN 3.46 NaN NaN NaN NaN NaN NaN
2023-04-06 NaN NaN NaN NaN 6.28 NaN NaN NaN NaN
2023-04-13 NaN NaN NaN NaN 6.27 NaN NaN NaN NaN
2023-04-20 NaN NaN NaN NaN 6.39 NaN NaN NaN NaN
2023-04-27 NaN NaN NaN NaN 6.43 NaN NaN NaN NaN

3551 rows × 9 columns

Preselection of Variables

target_column = ['MORTGAGE30US']
feature_columns = ['UNRATE', 'GS10', 'FEDFUNDS']
fred_df = fred_df[target_column + feature_columns]
display(fred_df)
MORTGAGE30US UNRATE GS10 FEDFUNDS
DATE
1947-01-01 NaN NaN NaN NaN
1947-02-01 NaN NaN NaN NaN
1947-03-01 NaN NaN NaN NaN
1947-04-01 NaN NaN NaN NaN
1947-05-01 NaN NaN NaN NaN
... ... ... ... ...
2023-04-01 NaN NaN 3.46 NaN
2023-04-06 6.28 NaN NaN NaN
2023-04-13 6.27 NaN NaN NaN
2023-04-20 6.39 NaN NaN NaN
2023-04-27 6.43 NaN NaN NaN

3551 rows × 4 columns

ValidMind Setup

vm.init(
  api_host = "http://localhost:3000/api/v1/tracking",
  api_key = "e22b89a6b9c2a27da47cb0a09febc001",
  api_secret = "a61be901b5596e3c528d94231e4a3c504ef0bb803d16815f8dfd6857fac03e57",
  project = "clgo0g0rt0000fjy6ozl9pb69"
)
  
True
df = fred_df
vm_dataset = vm.init_dataset(dataset=df)
Pandas dataset detected. Initializing VM Dataset instance...
Inferring dataset types...

Data Description

Data Quality

Frequency of the Series

Handling Frequencies

df = df.resample('MS').last()
vm_dataset = vm.init_dataset(dataset=df)
Pandas dataset detected. Initializing VM Dataset instance...
Inferring dataset types...

Exploratory Data Analysis

Univariate Analysis

vm.test_plans.describe_plan("time_series_univariate")
Attribute Value
ID time_series_univariate
Name TimeSeriesUnivariate
Description Test plan to perform time series univariate analysis.
Required Context['dataset']
Tests TimeSeriesLinePlot (Metric), TimeSeriesHistogram (Metric), ACFandPACFPlot (Metric), SeasonalDecompose (Metric), AutoSeasonality (Metric), AutoStationarity (Metric), RollingStatsPlot (Metric), AutoAR (Metric), AutoMA (Metric)
Test Plans []
test_plan_config = {
    "time_series_line_plot": {
        "columns": target_column + feature_columns
    },
    "time_series_histogram": {
        "columns": target_column + feature_columns
    },
    "acf_pacf_plot": {
        "columns": target_column + feature_columns
    },
    "auto_ar": {
        "max_ar_order": 3
    },
    "auto_ma": {
        "max_ma_order": 3
    },
    "seasonal_decompose": {
        "seasonal_model": 'additive',
         "fig_size": (40,30)
    },
    "auto_seasonality": {
        "min_period": 1,
        "max_period": 3
    },
      "auto_stationarity": {
        "max_order": 3,
        "threshold": 0.05
    },
      "rolling_stats_plot": {
        "window_size": 12    
    },
}
vm.run_test_plan("time_series_univariate", config=test_plan_config, dataset=vm_dataset)
Running Metric: acf_pacf_plot:  22%|██▏       | 2/9 [00:00<00:01,  3.58it/s]        The default method 'yw' can produce PACF values outside of the [-1,1] interval. After 0.13, the default will change tounadjusted Yule-Walker ('ywm'). You can use this method now by setting method='ywm'.
Running Metric: seasonal_decompose:  33%|███▎      | 3/9 [00:01<00:02,  2.37it/s]The default method 'yw' can produce PACF values outside of the [-1,1] interval. After 0.13, the default will change tounadjusted Yule-Walker ('ywm'). You can use this method now by setting method='ywm'.
The default method 'yw' can produce PACF values outside of the [-1,1] interval. After 0.13, the default will change tounadjusted Yule-Walker ('ywm'). You can use this method now by setting method='ywm'.
The default method 'yw' can produce PACF values outside of the [-1,1] interval. After 0.13, the default will change tounadjusted Yule-Walker ('ywm'). You can use this method now by setting method='ywm'.
The default method 'yw' can produce PACF values outside of the [-1,1] interval. After 0.13, the default will change tounadjusted Yule-Walker ('ywm'). You can use this method now by setting method='ywm'.
Running Metric: auto_ma:  89%|████████▉ | 8/9 [00:04<00:00,  2.48it/s]           
Warning: MORTGAGE30US is not stationary. Results may be inaccurate.
Warning: GS10 is not stationary. Results may be inaccurate.
Warning: MORTGAGE30US is not stationary. Results may be inaccurate.
Non-invertible starting MA parameters found. Using zeros as starting parameters.
Non-invertible starting MA parameters found. Using zeros as starting parameters.
Warning: GS10 is not stationary. Results may be inaccurate.
Non-invertible starting MA parameters found. Using zeros as starting parameters.
Non-invertible starting MA parameters found. Using zeros as starting parameters.
                                                                                                                                    

Results for Time Series Univariate Test Plan:


This test plan provides a preliminary understanding of the target variable(s) used in the time series dataset. It visualizations that present the raw time series data and a histogram of the target variable(s). The raw time series data provides a visual inspection of the target variable's behavior over time. This helps to identify any patterns or trends in the data, as well as any potential outliers or anomalies. The histogram of the target variable displays the distribution of values, providing insight into the range and frequency of values observed in the data.

Logged the following plots to the ValidMind platform:

Metric Plots

Logged the following plots to the ValidMind platform:

Metric Plots

Logged the following plots to the ValidMind platform:

Metric Plots

Logged the following dataset metric to the ValidMind platform:

Metric Name
seasonal_decompose
Metric Type
dataset
Metric Scope
Metric Value
{'MORTGAGE30US': [{'Date': '1971-04-01', 'MORTGAGE30US': 7.29, 'trend': nan, 'seasonal': 0.06226307189542485, 'resid': nan}, {'Date': '1971-05-01', 'MORTGAGE30US': 7.46, 'trend': nan, 'seasonal': 0.04249183006535937, 'resid': nan}, {'Date': '1971-06-01', 'MORTGAGE30US': 7.54, 'trend': nan, 'seasonal': 0.038235294117647194, 'resid': nan}, {'Date': '1971-07-01', 'MORTGAGE30US': 7.69, 'trend': nan, 'seasonal': 0.03680555555555551, 'resid': nan}, {'Date': '1971-08-01', 'MORTGAGE30US': 7.69, 'trend': nan, 'seasonal': 0.03517156862745089, 'resid': nan}, {'Date': '1971-09-01', 'MORTGAGE30US': 7.67, 'trend': nan, 'seasonal': 0.04834150326797367, 'resid': nan}, {'Date': '1971-10-01', 'MORTGAGE30US': 7.63, 'trend': 7.493333333333333, 'seasonal': 0.02809640522875831, 'resid': 0.1085702614379084}, {'Date': '1971-11-01', 'MORTGAGE30US': 7.51, 'trend': 7.492500000000001, 'seasonal': -0.03171568627450976, 'resid': 0.049215686274508945}, {'Date': '1971-12-01', 'MORTGAGE30US': 7.48, 'trend': 7.483333333333333, 'seasonal': -0....
Metric Plots

Logged the following dataset metric to the ValidMind platform:

Metric Name
auto_seasonality
Metric Type
dataset
Metric Scope
Metric Value
[{'Variable': 'MORTGAGE30US', 'Seasonal Periods': [1, 2, 3], 'Residual Errors': [0.0, 0.05984581260247153, 0.07960924266045578], 'Best Period': 1, 'Decision': 'Not Seasonality'}, {'Variable': 'UNRATE', 'Seasonal Periods': [1, 2, 3], 'Residual Errors': [0.0, 0.0575243050684972, 0.07740261227595713], 'Best Period': 1, 'Decision': 'Not Seasonality'}, {'Variable': 'GS10', 'Seasonal Periods': [1, 2, 3], 'Residual Errors': [0.0, 0.054467933471757865, 0.0724132946653997], 'Best Period': 1, 'Decision': 'Not Seasonality'}, {'Variable': 'FEDFUNDS', 'Seasonal Periods': [1, 2, 3], 'Residual Errors': [0.0, 0.06311610434019356, 0.08459744704273267], 'Best Period': 1, 'Decision': 'Not Seasonality'}]

Logged the following dataset metric to the ValidMind platform:

Metric Name
auto_stationarity
Metric Type
dataset
Metric Scope
Metric Value
[{'Variable': 'MORTGAGE30US', 'Integration Order': 0, 'Test': 'ADF', 'p-value': 0.6719476319623869, 'Threshold': 0.05, 'Pass/Fail': 'Fail', 'Decision': 'Non-stationary'}, {'Variable': 'MORTGAGE30US', 'Integration Order': 1, 'Test': 'ADF', 'p-value': 0.6719476319623869, 'Threshold': 0.05, 'Pass/Fail': 'Fail', 'Decision': 'Non-stationary'}, {'Variable': 'MORTGAGE30US', 'Integration Order': 2, 'Test': 'ADF', 'p-value': 2.1564529205869017e-30, 'Threshold': 0.05, 'Pass/Fail': 'Pass', 'Decision': 'Stationary'}, {'Variable': 'UNRATE', 'Integration Order': 0, 'Test': 'ADF', 'p-value': 0.019395286661891855, 'Threshold': 0.05, 'Pass/Fail': 'Pass', 'Decision': 'Stationary'}, {'Variable': 'GS10', 'Integration Order': 0, 'Test': 'ADF', 'p-value': 0.7099537095912242, 'Threshold': 0.05, 'Pass/Fail': 'Fail', 'Decision': 'Non-stationary'}, {'Variable': 'GS10', 'Integration Order': 1, 'Test': 'ADF', 'p-value': 0.7099537095912242, 'Threshold': 0.05, 'Pass/Fail': 'Fail', 'Decision': 'Non-stationary'}, {'Variable': 'GS10', 'Integ...

Logged the following plots to the ValidMind platform:

Metric Plots

Logged the following dataset metric to the ValidMind platform:

Metric Name
auto_ar
Metric Type
dataset
Metric Scope
Metric Value
[{'Variable': 'MORTGAGE30US', 'AR orders': [0, 1, 2, 3], 'BIC': [3265.668035138847, 304.79491696472724, 263.77701070543054, 246.044177168987], 'AIC': [3256.792531839374, 291.48646585961893, 246.03882463028074, 223.8794767052911]}, {'Variable': 'UNRATE', 'AR orders': [0, 1, 2, 3], 'BIC': [3539.1415392364793, 994.0162401519799, 997.2110677147793, 1003.5533561307045], 'AIC': [3529.5300941296455, 979.6023965917921, 977.997046684346, 979.541382314083]}, {'Variable': 'GS10', 'AR orders': [0, 1, 2, 3], 'BIC': [4201.0117145646345, 165.51199536334443, 87.41879025069926, 59.33207701504988], 'AIC': [4191.542531244689, 151.31178968783234, 68.48994742483043, 35.676986512639466]}, {'Variable': 'FEDFUNDS', 'AR orders': [0, 1, 2, 3], 'BIC': [4472.44765881503, 1177.7641357199707, 1045.0699658674368, 1033.6967271958817], 'AIC': [4463.01689204236, 1163.6216241302423, 1026.2181410647286, 1010.1380252206008]}]

Logged the following dataset metric to the ValidMind platform:

Metric Name
auto_ma
Metric Type
dataset
Metric Scope
Metric Value
[{'Variable': 'MORTGAGE30US', 'MA orders': [0, 1, 2, 3], 'BIC': [3265.6680351445043, 2460.523444192, 1832.5431349176386, 1397.450804157016], 'AIC': [3256.7925318450316, 2447.210189242791, 1814.792128318693, 1375.262045908334]}, {'Variable': 'UNRATE', 'MA orders': [0, 1, 2, 3], 'BIC': [3539.1415392765507, 2552.802126524599, 2043.3637002930723, 1682.571708624607], 'AIC': [3529.530094169717, 2538.384958864348, 2024.1408100794042, 1658.543095857522]}, {'Variable': 'GS10', 'MA orders': [0, 1, 2, 3], 'BIC': [4201.01171483049, 3104.303501753798, 2232.893221459788, 1672.0767163659268], 'AIC': [4191.542531510544, 3090.0997267738794, 2213.9548548198964, 1648.403758066062]}, {'Variable': 'FEDFUNDS', 'MA orders': [0, 1, 2, 3], 'BIC': [4472.447659014668, 3430.589791464587, 2623.1348841329664, 2207.2495561109813], 'AIC': [4463.016892241998, 3416.443641305583, 2604.2733505876276, 2183.6726391793077]}]
TimeSeriesUnivariate(test_context=TestContext(dataset=Dataset(raw_dataset=            MORTGAGE30US  UNRATE  GS10  FEDFUNDS
DATE                                            
1947-01-01           NaN     NaN   NaN       NaN
1947-02-01           NaN     NaN   NaN       NaN
1947-03-01           NaN     NaN   NaN       NaN
1947-04-01           NaN     NaN   NaN       NaN
1947-05-01           NaN     NaN   NaN       NaN
...                  ...     ...   ...       ...
2022-12-01          6.42     3.5  3.62      4.10
2023-01-01          6.13     3.4  3.53      4.33
2023-02-01          6.50     3.6  3.75      4.57
2023-03-01          6.32     3.5  3.66      4.65
2023-04-01          6.43     NaN  3.46       NaN

[916 rows x 4 columns], fields=[{'id': 'MORTGAGE30US', 'type': 'Numeric'}, {'id': 'UNRATE', 'type': 'Numeric'}, {'id': 'GS10', 'type': 'Numeric'}, {'id': 'FEDFUNDS', 'type': 'Numeric'}], sample=[{'id': 'head', 'data': [{'MORTGAGE30US': nan, 'UNRATE': nan, 'GS10': nan, 'FEDFUNDS': nan}, {'MORTGAGE30US': nan, 'UNRATE': nan, 'GS10': nan, 'FEDFUNDS': nan}, {'MORTGAGE30US': nan, 'UNRATE': nan, 'GS10': nan, 'FEDFUNDS': nan}, {'MORTGAGE30US': nan, 'UNRATE': nan, 'GS10': nan, 'FEDFUNDS': nan}, {'MORTGAGE30US': nan, 'UNRATE': nan, 'GS10': nan, 'FEDFUNDS': nan}]}, {'id': 'tail', 'data': [{'MORTGAGE30US': 6.42, 'UNRATE': 3.5, 'GS10': 3.62, 'FEDFUNDS': 4.1}, {'MORTGAGE30US': 6.13, 'UNRATE': 3.4, 'GS10': 3.53, 'FEDFUNDS': 4.33}, {'MORTGAGE30US': 6.5, 'UNRATE': 3.6, 'GS10': 3.75, 'FEDFUNDS': 4.57}, {'MORTGAGE30US': 6.32, 'UNRATE': 3.5, 'GS10': 3.66, 'FEDFUNDS': 4.65}, {'MORTGAGE30US': 6.43, 'UNRATE': nan, 'GS10': 3.46, 'FEDFUNDS': nan}]}], shape={'rows': 916, 'columns': 4}, correlation_matrix=None, correlations=None, type='training', options=None, statistics=None, targets=None, target_column=None, class_labels=None, _Dataset__feature_lookup={}, _Dataset__transformed_df=None), model=None, train_ds=None, test_ds=None, y_train_predict=None, y_test_predict=None, context_data={'seasonal_decompose': {'MORTGAGE30US': <statsmodels.tsa.seasonal.DecomposeResult object at 0x28b5cb1f0>, 'UNRATE': <statsmodels.tsa.seasonal.DecomposeResult object at 0x28bf29270>, 'GS10': <statsmodels.tsa.seasonal.DecomposeResult object at 0x28c5944c0>, 'FEDFUNDS': <statsmodels.tsa.seasonal.DecomposeResult object at 0x28c93f1c0>}}), config={...})

Multivariate Analysis

vm.test_plans.describe_plan("time_series_multivariate")
Attribute Value
ID time_series_multivariate
Name TimeSeriesMultivariate
Description Test plan to perform time series multivariate analysis.
Required Context['dataset']
Tests ScatterPlot (Metric), LaggedCorrelationHeatmap (Metric), EngleGrangerCoint (Metric), SpreadPlot (Metric)
Test Plans []
test_plan_config = {
    "scatter_plot": {
        "columns": target_column + feature_columns
    },
    "lagged_correlation_heatmap": {
        "target_col": target_column,
        "independent_vars": feature_columns
    },
    "engle_granger_coint": {
        "threshold": 0.05
    },
}
vm.run_test_plan("time_series_multivariate", config=test_plan_config, dataset=vm_dataset)
                                                                                                                                     

Results for Time Series Multivariate Test Plan:


This test plan provides a preliminary understanding of the features and relationship in multivariate dataset. It presents various multivariate visualizations that can help identify patterns, trends, and relationships between pairs of variables. The visualizations are designed to explore the relationships between multiple features simultaneously. They allow you to quickly identify any patterns or trends in the data, as well as any potential outliers or anomalies. The individual feature distribution can also be explored to provide insight into the range and frequency of values observed in the data. This multivariate analysis test plan aims to provide an overview of the data structure and guide further exploration and modeling.

Logged the following plot to the ValidMind platform:

Metric Plots

Logged the following plot to the ValidMind platform:

Metric Plots

Logged the following dataset metric to the ValidMind platform:

Metric Name
engle_granger_coint
Metric Type
dataset
Metric Scope
Metric Value
[{'Variable 1': 'MORTGAGE30US', 'Variable 2': 'UNRATE', 'Test': 'Engle-Granger', 'p-value': 0.6278763587562293, 'Threshold': 0.05, 'Pass/Fail': 'Fail', 'Decision': 'Not cointegrated'}, {'Variable 1': 'MORTGAGE30US', 'Variable 2': 'GS10', 'Test': 'Engle-Granger', 'p-value': 0.00868759943010125, 'Threshold': 0.05, 'Pass/Fail': 'Pass', 'Decision': 'Cointegrated'}, {'Variable 1': 'MORTGAGE30US', 'Variable 2': 'FEDFUNDS', 'Test': 'Engle-Granger', 'p-value': 0.02041650659325254, 'Threshold': 0.05, 'Pass/Fail': 'Pass', 'Decision': 'Cointegrated'}, {'Variable 1': 'UNRATE', 'Variable 2': 'GS10', 'Test': 'Engle-Granger', 'p-value': 0.013241818497465465, 'Threshold': 0.05, 'Pass/Fail': 'Pass', 'Decision': 'Cointegrated'}, {'Variable 1': 'UNRATE', 'Variable 2': 'FEDFUNDS', 'Test': 'Engle-Granger', 'p-value': 0.027578724502124778, 'Threshold': 0.05, 'Pass/Fail': 'Pass', 'Decision': 'Cointegrated'}, {'Variable 1': 'GS10', 'Variable 2': 'FEDFUNDS', 'Test': 'Engle-Granger', 'p-value': 0.005832335278420137, 'Threshold': 0.05,...

Logged the following plots to the ValidMind platform:

Metric Plots
TimeSeriesMultivariate(test_context=TestContext(dataset=Dataset(raw_dataset=            MORTGAGE30US  UNRATE  GS10  FEDFUNDS
DATE                                            
1947-01-01           NaN     NaN   NaN       NaN
1947-02-01           NaN     NaN   NaN       NaN
1947-03-01           NaN     NaN   NaN       NaN
1947-04-01           NaN     NaN   NaN       NaN
1947-05-01           NaN     NaN   NaN       NaN
...                  ...     ...   ...       ...
2022-12-01          6.42     3.5  3.62      4.10
2023-01-01          6.13     3.4  3.53      4.33
2023-02-01          6.50     3.6  3.75      4.57
2023-03-01          6.32     3.5  3.66      4.65
2023-04-01          6.43     NaN  3.46       NaN

[916 rows x 4 columns], fields=[{'id': 'MORTGAGE30US', 'type': 'Numeric'}, {'id': 'UNRATE', 'type': 'Numeric'}, {'id': 'GS10', 'type': 'Numeric'}, {'id': 'FEDFUNDS', 'type': 'Numeric'}], sample=[{'id': 'head', 'data': [{'MORTGAGE30US': nan, 'UNRATE': nan, 'GS10': nan, 'FEDFUNDS': nan}, {'MORTGAGE30US': nan, 'UNRATE': nan, 'GS10': nan, 'FEDFUNDS': nan}, {'MORTGAGE30US': nan, 'UNRATE': nan, 'GS10': nan, 'FEDFUNDS': nan}, {'MORTGAGE30US': nan, 'UNRATE': nan, 'GS10': nan, 'FEDFUNDS': nan}, {'MORTGAGE30US': nan, 'UNRATE': nan, 'GS10': nan, 'FEDFUNDS': nan}]}, {'id': 'tail', 'data': [{'MORTGAGE30US': 6.42, 'UNRATE': 3.5, 'GS10': 3.62, 'FEDFUNDS': 4.1}, {'MORTGAGE30US': 6.13, 'UNRATE': 3.4, 'GS10': 3.53, 'FEDFUNDS': 4.33}, {'MORTGAGE30US': 6.5, 'UNRATE': 3.6, 'GS10': 3.75, 'FEDFUNDS': 4.57}, {'MORTGAGE30US': 6.32, 'UNRATE': 3.5, 'GS10': 3.66, 'FEDFUNDS': 4.65}, {'MORTGAGE30US': 6.43, 'UNRATE': nan, 'GS10': 3.46, 'FEDFUNDS': nan}]}], shape={'rows': 916, 'columns': 4}, correlation_matrix=None, correlations=None, type='training', options=None, statistics=None, targets=None, target_column=None, class_labels=None, _Dataset__feature_lookup={}, _Dataset__transformed_df=None), model=None, train_ds=None, test_ds=None, y_train_predict=None, y_test_predict=None, context_data=None), config={...})